Deep learning for 1-bit compressed sensing-based superimposed CSI feedback

In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, 1-bit compressed sensing (CS)-based superimposed channel state information (CSI) feedback has shown many advantages, while still faces many challenges, such as low accuracy of the downlink CSI recovery and large processing delays. To overcome these drawbacks, this paper proposes a deep learning (DL) scheme to improve the 1-bit compressed sensing-based superimposed CSI feedback. On the user side, the downlink CSI is compressed with the 1-bit CS technique, superimposed on the uplink user data sequences (UL-US), and then sent back to the base station (BS). At the BS, based on the model-driven approach and assisted by the superimposition-interference cancellation technology, a multi-task detection network is first constructed for detecting both the UL-US and downlink CSI. In particular, this detection network is jointly trained to detect the UL-US and downlink CSI simultaneously, capturing a globally optimized network parameter. Then, with the recovered bits for the downlink CSI, a lightweight reconstruction scheme, which consists of an initial feature extraction of the downlink CSI with the simplified traditional method and a single hidden layer network, is utilized to reconstruct the downlink CSI with low processing delay. Compared with the 1-bit CS-based superimposed CSI feedback scheme, the proposed scheme improves the recovery accuracy of the UL-US and downlink CSI with lower processing delay and possesses robustness against parameter variations.


Introduction
Massive multiple-input multiple-output (MIMO) has become the key technology of the fifth generation (5G) wireless communication system, due to its advantages in system capacity and link robustness [1,2], etc. As premises of these advantages, the base station (BS) needs to obtain accurate downlink channel state information (CSI), and rely on downlink CSI for precoding [3], antenna selection [4], radio resource allocation [5], and communication interference management [6], etc. In time division duplex (TDD) mode, the downlink CSI can be obtained from uplink CSI by exploiting channel reciprocity [7,8]. For frequency-division duplex (FDD) mode, it is difficult to develop the channel reciprocity due to the different frequency bands used by uplink and downlink [9,10]. Thus, the downlink CSI is usually a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 estimated by users and fed back to the BS in FDD massive MIMO system [9]. However, due to a large number of antennas in massive MIMO systems, CSI feedback incurs significant feedback overhead, resulting in serious uplink bandwidth occupation.
To reduce feedback overhead, lots of compressive sensing (CS)-based CSI feedback methods have emerged [11][12][13][14]. In recent years, deep learning (DL)-based CSI feedback methods [15][16][17] are proposed to further reduce feedback overhead. Although the feedback overhead is reduced to some extent, both CS-based CSI feedback and DL-based CSI feedback still occupy significant uplink bandwidth resources. To avoid the occupation of uplink bandwidth resources, the superimposed CSI feedback was proposed in [18], yet causes mutual interference due to superimposition operation. In [10,19,20], the 1-bit CS-based, DL-based, and extreme learning machine (ELM)-based superimposed CSI feedbacks are respectively proposed to reduce this mutual interference. Inspired by the advantages of superimposed CSI feedback based on 1-bit CS and DL, we propose a DL-based 1-bit superimposed CSI feedback scheme in this paper.
For reducing feedback overhead, the DL-based data-driven CSI feedback can be divided into two categories. The first category is mainly based on the combination of CS technique and DL technique, while the other category employs the DL technique for the quantized data. In the first category, [21] is the first application of DL for CSI feedback. In [21], the CSI feedback was mainly based on a convolutional neural network called CsiNet, which achieved superior performance over various CS-based CSI feedbacks. Yet, the time correlation, frequency correlation, spatial correlation, feedback delay and feedback errors, etc., were not considered in CsiNet, and led to limited applications. To remedy these defects, some improvements have been proposed in [22][23][24]. In [22], a CsiNet long short-term memory (CsiNet-LSTM) was proposed by exploiting the time correlation, which is suitable for practical application in timevarying channels. The recurrent neural network-based CsiNet in [23] was developed to capture the temporal and frequency correlations of wireless channels. Considering the spatial correlation among antennas, the bidirectional LSTM (Bi-LSTM) and bidirectional convolutional LSTM (Bi-ConvLSTM) were proposed in [24]. Another category of feedback reduction proposed for DL-based CSI feedback is mainly based on the quantization operation, e.g., [25,26]. In [25], a bit-level CsiNet+ was proposed, which made the current CSI feedback network applicable in real communication systems and minimized the introduced quantitative distortion to improve the reconstruction quality. By employing the quantization and entropy coding blocks into a full convolution network, the work of [26] obtained drastic improvement in CSI reconstruction quality at even extremely low feedback rates. Although the DL-based CSI feedback in [21][22][23][24][25][26] has achieved significant improvements in feedback reduction compared with the CS-based approaches, the uplink bandwidth resources were still seriously occupied due to the massive MIMO scenarios.
To avoid the occupation of uplink bandwidth resources, superimposed CSI feedback schemes were proposed in [18][19][20]. In [18], the downlink CSI was spread and then superimposed on the uplink user data sequences (UL-US) as feedback to the BS, while the recoveries of the UL-US and downlink CSI were deteriorated by superimposition interference. To remedy this defect, a DL-based superimposed CSI feedback was proposed in [19], and an ELM-based superimposed CSI feedback with lower computational complexity was proposed in [20].
Considering the simplicity and cost-effectiveness, a low-consumed CSI feedback using 1-bit CS has been studied in [27], in which 1-bit operation means to discard the signal amplitude and only retain its sign information. In this work, the downlink CSI was quantified by 1-bit CS to achieve low-consumed feedback, while this work still occupied uplink bandwidth resources. To remedy this defect, the superimposed CSI feedback and 1-bit CS technique were combined in [10] and presented many advantages, e.g., the avoidance of uplink-bandwidth-resource occupation and the reduction of mutual interference, etc. However, it is facing challenges in recovery accuracy and processing delay [28], etc.
By integrating the promising advantage of deep learning and inspired by the superimposed CSI feedback by using 1-bit CS in [10], we propose a DL-based 1-bit superimposed CSI feedback scheme in this paper. First, the downlink CSI is compressed by the 1-bit CS technique and then superimposed on the UL-US as feedback to the BS. At the BS, to recover the bit information for both the UL-US and downlink CSI, a multi-task detection network with transmitted signal feature extraction is first constructed. Then, with the recovered bits of the downlink CSI, a lightweight reconstruction network, which consists of an initial feature extraction of the downlink CSI with simplified traditional method and a single hidden layer network, is utilized to reconstruct the downlink CSI with a low processing delay. Specifically, the advantages of superimposed CSI feedback by using 1-bit CS are inherited, i.e., without any occupation of uplink bandwidth for CSI feedback, and effective interference cancellation in [10], and the recovery accuracies for both the UL-US and downlink CSI are improved.

Contributions
In this paper, a DL-based 1-bit superimposed CSI feedback scheme is proposed to improve the superimposed CSI feedback 1-bit CS approach in [10]. To the best of our knowledge, there is a little literature focusing on the DL-based 1-bit superimposed CSI feedback method. And there is also no research on the introduction of deep learning into 1-bit superimposed feedback. The main contributions of this paper are as follows: • We propose the DL-based scheme for 1-bit CS-based superimposed CSI feedback. By using the nonlinear mapping and feature extraction ability of the DL, we develop a detection network and a reconstruction network to further suppress nonlinear superimposition interference, and improve the detection and reconstruction performances. The proposed scheme retains the advantages of 1-bit CS-based superimposed CSI feedback [10], while obtains better recovery accuracy for both the UL-US and downlink CSI with much lower processing delay.
• We construct a multi-task detection network to recover the bit information for both the UL-US and downlink CSI, based on the model-driven approach and assisted by the superimposition-interference cancellation technology. This detection network is jointly trained to detect the UL-US and downlink CSI simultaneously, capturing a globally optimized network parameter. We use the ability that DL solve nonlinear problems to solve the superimposition separation, which shortens processing delay while improving the detection performance without any second-order statistical information about channel and noise.
• We develop a lightweight reconstruction network by using the linear approximation ability of the traditional superimposed coding aided binary iterative hard thresholding (SCA-BIHT) algorithm and the advantages of deep learning to deal with nonlinear problems. In this network, the initial feature of downlink CSI is extracted by SCA-BIHT algorithm with only a few iterations, and then a single hidden layer refinement network is constructed to refine the downlink CSI reconstruction. The reconstruction network not only greatly reduces the iterations of the traditional SCA-BIHT algorithm to raise efficiency, but also obtains a better reconstruction performance of the downlink CSI with a lower processing delay.
The remainder of this paper is structured as follows: In Section II, we introduce the system model of the 1-bit superimposed CSI feedback. The DL-based 1-bit superimposed CSI feedback method is presented in Section III and followed by numerical results in Section IV. Finally, Section V concludes our work.
Notations: Boldface upper case and lower case letters denote matrix and vector respectively. (�) T and (�) † denote transpose and matrix pseudo-inverse respectively. I P is the identity matrix of size P × P. BNð�Þ denotes the operation of batch normalization. k�k 2 is the Euclidean norm. sign(�) denotes an operator of taking symbolic information, e.g., the sign function returns +1 for positive numbers and 0 otherwise. Re(�) and Im(�) represent real and imaginary part operations, respectively. K(x) represents computing the best k-term approximation of x by thresholding. � denotes the operation of Hadamard product for two vectors or matrices.

System model
The system model is shown in Fig 1. Considering a massive MIMO system that consists of one BS with N antennas and U single-antenna users, after the processing of matched-filter, the received signal from user-u, u = 1, 2, . . ., U, denoted as R u , is given as where g u 2 C N�1 denotes the uplink channel vector from user-u to the BS, N u 2 C N�P is the circularly symmetric complex Gaussian noise (CSCG) of feedback link, P is the length of the UL-US. To avoid occupying the limited and crowded uplink bandwidth resources [29,30], x u 2 C 1�P adopts superimposition technology, and denotes the transmitted signal of user-u, which is given by [10] x u ¼ ffi ffi ffi ffi ffi ffi ffiffi rE u p s u þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi where ρ 2 [0, 1] is the power proportional coefficient of the downlink CSI, E u is the transmitted power of user-u, and s u 2 C 1�P and d u 2 C 1�P stand for the modulated superimposition signal and the UL-US, respectively. In this paper, the downlink CSI, satisfying h u 2 C 1�N , is a sparse vector with K-sparsity [10], i.e., only K non-zero elements in h u . According to the 1-bit CS technique [31], h u is

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback compressed by where Φ u 2 R N�M is the measurement matrix [10], and y real;u 2 R 1�M and y imag;u 2 R 1�M denote the real and imaginary parts of the compressed CSI, respectively. For the convenience of digital modulation, the support-set of the downlink CSI h u , denoted as z u 2 {0, 1} 1×N , is labelled by the bit-form [10], i.e., where z u,k and h u,k are the k-th element in z u and h u , respectively. In order to reconstruct a more accurate downlink CSI at the BS, z u needs to be fed back to the BS with y real,u and y imag,u by using the feedback vector p u . The feedback vector p u is formed by merging y real,u , y imag,u , z u [10], i.e., It is worth noting that p u can be viewed as a bit stream with the elements of p u only being 0 or 1. With the digital modulation, we have where f modu (�) denotes the mapping function of digital modulation, such as the quadrature phase shift keying (QPSK). In Eq (6), p u is mapped as modulated feedback vector (MFV) w u 2 C 1�L , where L = d(2M + N)/2e. Without loss of generality, the UL-US's length P is larger than L due to main task of user services [19,20]. Similar to [10,20], to superimpose MVF with UL-US, a spread spectrum method is utilized, which could capture spread spectrum gain to suppress the interference caused by the superimposition processing. Thus, the superimposition signal s u , given in Eq (2), is obtained by using a spreading matrix to spread the MFV w u , i.e., where Q u 2 R L�P is a spreading matrix, which satisfies Q u Q T u ¼ PI L , e.g., the Walsh matrix [32]. By combining Eqs(2) and (7), the transmitted signal of user-u x u is rewritten as At the user-u, the downlink CSI h u is compressed by using 1-bit CS (given in Eq (3)), and thus the transmitted signal x u is formed by weighting and superimposing the UL-US d u and superimposition signal s u according to Eqs (2)- (8). With the received R u at the BS, the detection network and reconstruction network are designed to detect the UL-US d u and superimposition signal s u , and recover the downlink CSI h u , respectively. The detection and reconstruction networks will be deliberated in Section III.

DL-based superimposed CSI feedback using 1-bit CS
In this section, according to the superimposed CSI feedback scheme with the 1-bit CS [10], the detection network and reconstruction network are developed to recover the UL-US and downlink CSI. A transmitted signal feature extraction is first employed to coarsely extract the feature after equalizing the uplink wireless channel. Then, with the extracted transmitted signal feature, we design the detection network and reconstruction network.

Transmitted signal feature extraction
From Eqs (2)-(8), the transmitted signal x u is formed by superimposing the UL-US d u and the modulated superimposition signal s u . To recover d u and s u , the transmitted signal x u should be first extracted, and thus the uplink channel g u in Eq (1) needs to be removed by channel equalization. From [10,19], the transmitted signal feature extraction is employed in this paper. That is, the uplink wireless channel is equalized through zero forcing (ZF) equalization, so as to extract the transmission signal feature. The feature extraction is given as where _ x u denotes the coarse extracted vector of transmitted signal x u . It should be noted that, relative to the use of ZF equalization to extract the transmitted signal feature, the use of minimum mean square error (MMSE) channel equalization can obtain better feature extraction performance, while encounters higher computational complexity. Especially, the MMSE equalization requires second-order statistics of uplink channel g u and noise N u [10,18], which leads to application difficulties. Therefore, we use low-complexity ZF equalization to extract the transmitted signal feature, leaving the feature improvement to the subsequent detection network.
With the extracted transmitted signal feature _ x u , we construct the detection network to detect UL-US d u and superimposition signal s u . From Eq (7), s u is obtained by spreading the MFV w u . In addition, the compressed downlink CSI y real,u and y imag,u can be recovered from w u (given in Eqs (3)-(6)).

Detection network
In order to eliminate superimposed interference and obtain better downlink CSI and UL-US reconstruction accuracy, the detection network is designed by using unfolding method [33]. That is, the iteration steps in [10] are replaced by the groups of CSI-Net and Det-Net, including six subnets, i.e., CSI-Net1, Det-Net1, CSI-Net2, Det-Net2, CSI-Net3, and Det-Net3, in which the UL-US d u and MFV w u are detected by solving a multi-task problem.
Architecture. The architecture of detection network is illustrated in Fig 2. From the perspective of convenience and ease of implementation, we first use the easiest single hidden layer neural network architecture to design CSI-Neti and Det-Neti (i = 1, 2, 3). After experimental verification, this architecture is not only easy but also improves performance. The architecture of detection network is described as follows: • CSI-Net1, DET-Net1, CSI-Net2, DET-Net2, CSI-Net3, and DET-Net3 are successively cascaded to form the multi-task network. To reduce mutual interference, some expert knowledge is inserted between each cascaded subnets, i.e., the interference cancellation technology [18,19]. In more detail, the CSI interference reduction (CSI IR) is introduced between the CSI-Neti and Det-Neti (i = 1, 2, 3), while the UL-US interference reduction (UL-US IR) is inserted between Det-Neti and CSI-Net(i + 1) (i = 1, 2).
• The same network structures are employed by the CSI-Neti and Det-Neti (i = 1, 2, 3). Each subnet consists of an input layer, a hidden layer, and an output layer with a fully connected mode. For each CSI-Neti (DET-Neti) (i = 1, 2, 3), the number of neurons in the input layer, hidden layer, and output layer are 2L (2P), 4L (4P), and 2L (2P), respectively.
• For each subnet, a batch normalization (BN) is employed to normalize its input sets, converting the subnet input to zero mean and unit variance.
• The activation functions of linear activation, leaky rectified linear unit (LReLU) [34] and hyperbolic tangent (Tanh) are adopted by the input layer, hidden layer and output layer of each subnet, respectively.
• The outputs of CSI-Net3 and DET-Net3 are the detected u ), respectively. The network architecture is summarized in Table 1. Process of detection network. • Data preprocessing. Due to the requirement of real-valued data sets in common DL-based framework, we transform the coarse extracted complex-

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback valued vectors _ x u , UL-US d u and MFV w u to the real-valued vectors, i.e., To match the real-valued operation, the spreading matrix Q u is also transformed to real-valued matrix _ Q u , which is obtained as Then, to train the detection network, _ x real u is employed as the network input, while d real u and w real u are used as training labels in the CSI-Neti and Det-Neti (i = 1, 2, 3), respectively. In addition, to facilitate the unified description of the sub-network input in the detection network, we usew ð1Þ u to represent the input _ x real u of the detection network, i.e.,w ð1Þ u ¼ _ x real u . • Processing procedure. The processing procedure of trained detection network is given in Table 2, and some steps are explained as follows.

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback CSI IR: In steps (1-2), (2-2), and  in Table 2, to reduce the interference from MFV, a spreading is employed by CSI IR, which is expressed as where _ Q u is obtained according to Eq (11). Then,d ðiÞ u is fed into Det-Neti to detect the UL-US. Process of Det-Neti: The Det-Neti (i = 1, 2, 3), is used to detect the UL-US, which is expressed as  4) and (2)(3)(4) in Table 2, to reduce the interference from the UL-US, the outputs of Det-Neti (i = 1, 2, 3) are processed by expert knowledge, which is expressed as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi With the process given in Table 2, the UL-US d u and MFV w u are detected, where the realvalued descriptions of the detected UL-US d u and MFV w u are denoted by _ d u and _ w u , respectively. Then, with the detected MFV _ w u , we develop the reconstruction network to recover the downlink CSI h u .

Reconstruction network
A reconstruction network is designed to further improve the reconstruction accuracy of h u on the basis of the reconstruction algorithm, and to reduce the processing delay caused by multiple iterations of the reconstruction algorithm. The reconstruction network is given in Fig 3, and the processing procedure is summarized in Table 3. Generally, the corresponding de-mapping is first employed to restore the compressed downlink CSI. Then, the reconstruction algorithm given in [10] with reduced complexity is utilized to perform an initial feature extraction of the downlink CSI. According to the initial feature extraction, two dense layers are used to refine the reconstruction of the downlink CSI. These details will be presented as follows.

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback Inverse mapping operation. From Eqs (6) and (10), the real-valued w real u is formed by digital modulation and the mapping from complex-valued to real-valued form. Correspondingly, we adopt inverse mapping to recover the complex-valued and unmodulated forms. An inverse mapping, denoted by f R!C ð�Þ, is first employed to map the real-valued _ w u back to its complexvalued form. Then, the digital demodulation mapping, denoted as f demo (�), is used to demodulate this complex-valued vector. The whole inverse mapping process is expressed as Then, the estimation of sparsity K of the downlink CSI, denoted as _ K , is obtained by calculating the number of non-zero entries in _ z u . Initial feature extraction. With _ y real;u , _ y imag;u , _ z u and _ K , we employ the reconstruction algorithm, named SCA-BIHT in [19], to conduct an initial feature extraction of the downlink CSI, while leaving the refinement reconstruction to the subsequent refinement network. In particular, this initial feature extraction is executed by SCA-BIHT with only a few iterations instead of dozens or hundreds of iterations in [10]. Here, β times of iteration are adopted in this paper. The initial feature extraction procedure is presented in Table 4.
Based on the initial feature extraction, we then inputh u to a single hidden layer network to refine the reconstruction accuracy of the downlink CSI h u . Refinement network. According to the initial feature extraction, a single hidden layer network is employed to refine the reconstruction of the downlink CSI, and its network architecture is summarized in Table 5. Similar to CSI-Neti and Det-Neti (i = 1, 2, 3) of the detection network, the refinement network is also designed as the easiest single hidden layer neural network architecture. 1): Map the real-valued _ w u to ½ _ y real;u ; _ y imag;u ; _ z u � 2): Use ½ _ y real;u ; _ y imag;u ; _ z u � to rough extract the feature of downlink CSI and obtainh u by the SCA-BIHT algorithm with β times of iteration.
3): Use the refinement network to refine the reconstructed downlink CSI _ h u .

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback The the initial feature of downlink CSIh u and the label h u are complex-valued, and thus need to be mapped to real-valued form, i.e., Then, using the refinement network, the refined reconstruction of the downlink CSI is obtained from the expression where W 31 (b 31 ) and W 32 (b 32 ) denote weights (biases) of the hidden layer and output layer of the refinement network, respectively.

Model training specification
Since model training is significant for network performance, we give the training details in this subsection. In the following, we discuss the training method, data preparation, and loss function, respectively. Training method. In this paper, the detection network and reconstruction network are separately trained to reduce the complexity of parameter tuning. For detection network, there are six subnetworks needed to be trained, including the training parameters W  Fig 2, the detection network is a multi-task network in reality, which generates the estimated MFV _ w u and UL-US _ d u , respectively. Thus, we jointly train the six subnets of detection network to resolve this multi-task issue. In the reconstruction network, only the refinement network needs to be trained to optimize its network parameters W 31 , W 32 , b 31 , and b 32 . With the trained detection network and the corresponding initial feature extraction of reconstruction network, we then train the refinement network solely.
Data preparation for training. The training set is acquired by a simulation approach, in which a significant amount of data samples are generated to train two networks, i.e., the detection network and the refinement network. Specially, these data samples are generated as follows.
h u and g u are randomly generated on the basis of the distribution CN ð0; ð1=NÞÞ. To train the detection network, we first collect the _ x u according to Eq (9) to form input sets. Then we save the corresponding d u and w u as target sets, where d u is formed by QPSK modulation with randomly generated Bernoulli binary sequences. All the complex-valued data sets are converted to real-valued form. For example, the input and label of the detection network are set as fð _ x real u Þ; ðd real u ; w real u Þg according to Eq (10). Similarly, the input and label of the refinement network are set as fðh real u Þ; ðh real u Þg according to Eq (17). In addition, to validate the trained network parameters during the training phase, a validation set is generated by following the same

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback generation method of training set, and thus we could capture a set of optimized network parameters. Loss functions. The detection network is trained by optimizing weights and biases of each subnet, i.e., CSI-Neti and Det-Neti, to minimize the loss function [35,36]. In addition, the l 2 regularization is employed in the detection network to avoid gradient explosions [37]. Thus, the loss function for training the detection network is expressed as where α 1 is the regularization coefficient and Θ 1 denotes the training parameters, i.e., weights and biases of the detection network. In Eq (19), loss 1 represents the weighted sum of the losses of six subnets, which is given as where _ d ðiÞ u and _ w ðiÞ u are the output of the CSI-Neti and Det-Neti, respectively. With this detection network, we obtain the MFV _ w ð3Þ u and UL-US _ d ð3Þ u , i.e., _ w u and _ d u . With the trained detection network, the reconstruction network is trained according to _ y real;u , _ y imag;u , and _ z u , which are detected by the detection network and expressed in Eq (16).
In reconstruction network, only the refinement network with single hidden layer needs to be optimized, and thus the loss function is given by where _ h u is the estimated downlink CSI, α 2 is the regularization coefficient and Θ 2 denotes all training parameters of refinement network.
To reap an effective and feasible regularization coefficient and verify the generalization performance of detection network and reconstruction network, Fig 4 compares the convergence behaviors of Loss Det and Loss Rec under different regularization coefficients (i.e., α 1/2 = 10 −9 , 10 −8 , . . ., 10 −4 ). From Fig 4, we can observe the convergence values of training loss and validation loss are almost the same, which indicates the excellent generalization performance of detection and reconstruction network. In addition, a smaller value of α 1 (or α 2 ) leads to a smaller convergence value of training loss or validation loss. Yet according to Eq (21), the value of Loss Rec is related to α 2 , the α 2 that minimizes the Loss Rec may not achieve the best reconstruction performance. The optimized α 2 is determined by the reconstruction performance of the downlink CSI, which will be given in the experimental analysis.
By using the trained detection network and reconstruction network, the UL-US _ d u and downlink CSI _ h u can be recovered from the proposed scheme. Compared with the 1-bit CSbased superimposed CSI feedback scheme in [10], both the recoveries of the UL-US and downlink CSI are improved by the proposed scheme, while the requirements of second-order statistics of noise are avoided. Besides, these improvements are robust against parameter variations, which will be presented in the experimental analysis.

Experiment results
In this section, we give numerical results of the proposed scheme. Definitions and basic parameters involved in simulations are first given. Then, to verify the effectiveness of the proposed scheme, we show the bit error rate (BER) of UL-US and MFV, and the normalized mean squared error (NMSE) of reconstructed downlink CSI is presented. Finally, we compare the online running time between the proposed scheme and conventional scheme. The source code is available at https://github.com/qingchj851/DL-1BitCS-SC-CSI-Feedback2.

Parameter setting
Definitions involved in simulations are given as follows. The signal-to-noise ratio (SNR) in decibel (dB) of the signal received at BS from user-u is defined as [19]

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback The NMSE is utilized to evaluate the recovery performance of downlink CSI, and defined as [19] NMSE ¼ In the experiment phase, P = 512, N = 64, and the sampling rate c is defined as c = M/N. The measurement matrix is randomly generated and obeys the Gaussian distribution [38], and it is guaranteed that its row vector and the column vector of the compressed signal cannot be sparsely represented by each other. The Walsh matrix generated by the Walsh sequence is employed as the spreading matrix Q u [32]. The UL-US d u is formed by applying QPSK modulation upon randomly generated Bernoulli binary sequences. The training input data-sets are generated according to Eqs (1)- (9). Trainings of detection network and reconstruction network are carried out under the noise-free setting, and this is different from the training of the DL-based network in [19], where the training SNR is set as 5dB. Testing data-sets are generated by using the same method as the training data-sets. The sizes of training set, validation set and testing set of detection network are 60,000, 20,000, and 20,000, respectively. For the reconstruction network, 45,000, 15,000, and 15,000 samples are respectively employed for the training, validation, and testing. Both in detection network and reconstruction network, we use Adam optimizer as the training optimization algorithm, and the values of epoch and learning rate are set to 50 and 0.001, respectively. In the simulations, we stop the testing for BER performance when at least 1000-bit errors are observed [19,20]. For the convenience of expression, we utilize "Proposed" and "Ref [10]" to denote the proposed DL-based 1-bit superimposed CSI feedback and the traditional 1-bit superimposed feedback (mentioned in [10]), respectively.

BER performance
In this subsection, the effectiveness and robustness of the detection network will be verified. To clarify the effectiveness, the comparison of BER's performance between "Proposed" and "Ref [10]" is first presented in Fig 5. Next, to verify the robustness of the detection network, the impacts against the parameters of ρ and c are given in Figs 6 and 7, respectively.
To verify the effectiveness of the detection network, both the UL-US and MFV's BER performances are illustrated due to the UL-US being superimposed with MFV. Fig 5 depicts the BER curves of the UL-US and MFV in terms of SNR, where c = 2.0 and ρ = 0.10 are considered. From Fig 5, the BERs of UL-US and MFV obtained by "Proposed" are respectively smaller than those of "Ref [10]" in the whole given SNR regions. For example, when SNR = 10dB, the BER of UL-US (or MFV) by "Proposed" is around 3.4 × 10 −3 (or 4.5 × 10 −2 ), while the BER of UL-US (or MFV) of "Ref [10]" is nearly 1.4 × 10 −2 (or 8.5 × 10 −2 ). That is, compared with "Ref [10]", both the UL-US and MFV's BERs are improved by the proposed detection network. Especially, these improvements are significant to be observed in the relatively higher SNR. The possible reason is that the detection network is trained under noise-free setting.
To verify the robustness of BER performance's improvement against the impact of ρ, the BER curves with different values of ρ, i.e., ρ = 0.05, ρ = 0.10, and ρ = 0.15, are plotted in Fig 6, where c = 2.0 is considered. From Fig 6, for each given ρ, the UL-US and MFV's BERs of the "Proposed" are respectively smaller than those of the "Ref [10]". This reflects that the proposed detection network could improve the BER performance under different ρ for both UL-US and MFV. As ρ increases from 0.05 to 0.15 for "Proposed", the BER of UL-US increases while the BER of MFV decreases, and vice versa. The reason is that the increased (or decreased) ρ   In Fig 7, for each given c, the UL-US and MFV's BER performances of the "Proposed" are smaller than those of the "Ref [10]". This implies that the proposed detection network could improve UL-US and MFV's BER performance of "Ref [10]" for different values of c. With the increase of c, for both "Proposed" and "Ref [10]", the BERs of both UL-US and MFV increase, and vice versa. The reason is that the spreading gain (i.e., P/M) decreases with the increase of c, and thus affects the detection performances (similar results can be found in [19,20]). As a whole, compared with "Ref [10]", the BER improvements of UL-US and MFV are evidently observed for each given c. Thus, the proposed detection network shows its robustness of improving UL-US and MFV's BER performances against the impact of c.
To sum up, according to Figs 5-7, the UL-US and MFV's BER performances of "Ref [10]" are effectively improved by the proposed detection network, and these improvements are robust against the impacts of ρ and c.

NMSE performance
With the detected MFV, the downlink CSI can be reconstructed by using the proposed reconstruction network. To validate the effectiveness of the proposed reconstruction network, NMSE curves of the downlink CSI recovered from the proposed reconstruction network and

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback SCA-BIHT [10] are first given in Fig 8. Then, to demonstrate the robustness of the reconstruction network, the NMSE performance against the impacts of ρ and c are shown in Figs 9 and 10, respectively. In addition, we present the influence of regularization coefficient α 2 on the NMSE performance in Table 6.
In Fig 8, the NMSE curves of downlink CSI's recovery are depicted, where c = 2.0 and ρ = 0.10. The "Proposed" employs 8 times of iteration for initial feature extraction, i.e., β = 8, followed by two dense layers. In contrast, different iteration values (i.e., β = 10, β = 20, β = 50, and β = 100) are given for the SCA-BIHT algorithm of "Ref [10]". From Fig 8, when SNR � 14dB, the "Proposed" achieves the minimum NMSE, even lower than that of "Ref [10]" with β = 100. For example, when SNR = 12dB, the NMSE of "Proposed" is about 8.94 × 10 −2 , while that of "Ref [10]" with β = 100 is about 1.43 × 10 −1 . That is, with a smaller NMSE, the two dense layers in the reconstruction network can replace 95 iterations of SCA-BIHT algorithm in the relatively low SNR region (e.g., SNR �14dB), leading to a lower processing delay. For the case where SNR �16dB, the NMSE of "Proposed" outperforms that of "Ref [10]" with β = 10. Although it shows a slightly higher NMSE of "Proposed" than "Ref [10]" with β = 50 and 100, it compensates the high processing delay of "Ref [10]". On the whole, the proposed reconstruction network has a lower processing delay than "Ref [10]" and shows a better NMSE performance in the relatively low SNR region. Therefore, the proposed reconstruction network is effective to improve the NMSE performance of "Ref [10]".
To verify the robust improvement of NMSE performance against the impact of ρ, the NMSE curves with variant ρ (i.e., ρ = 0.05, ρ = 0.10, and ρ = 0.15) are plotted in Fig 9. From Fig  9, for each given ρ, the downlink CSI's NMSE of the "Proposed" is smaller than that of the "Ref [10]". With the increase of ρ (increases from 0.05 to 0.15), the NMSE decreases for both "Ref [10]" and "Proposed", and vice versa. The reason is that the downlink CSI can obtain more

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback

PLOS ONE
Deep learning for 1-bit compressed sensing-based superimposed CSI feedback transmission power with a larger value of ρ. In addition, with the increase of SNR, the curves gradually converge for the reason that the main influence of NMSE comes from the superimposed interference in a relatively high SNR region. On the whole, for each given value of ρ in Fig 9, the NMSE of "Ref [10]" is reduced by the "Proposed", especially in the relatively low SNR region (e.g., SNR �14dB). Thus, the proposed reconstruction network possesses its robustness for improving the NMSE performance against the impact of ρ. Fig 10 plots the NMSE curves of downlink CSI with different values of compression rate c (i.e., c = 2.0, c = 2.5, and c = 3.0) to validate the robustness of NMSE performance's improvement against the impact of c. In Fig 10, for each given c, the downlink CSI's NMSE performance of the "Proposed" is smaller than that of the "Ref [10]". In addition, for SNR � 10dB, the NMSEs of "Proposed" increase as the increase of c. The possible reason is that the higher compression rate results in lower spreading gain (i.e., P/M). In the low SNR region, the main impact of NMSE performance comes from the noise interference and is limited by the low spread spectrum gain. Yet, the NMSE's convergence value of high compression rate is smaller than that of low compression rate. For example, for the cases where c = 2.0, c = 2.5, and c = 3.0, the convergence values of "Proposed" NMSE are about 6.0 × 10 −2 , 4.9 × 10 −2 , and 4.4 × 10 −2 , respectively. The possible reason is that the higher compression rate brings more reconstruction information in the high SNR region, where the noise interference almost disappeared. On the whole, for each given value of c in Fig 10, the NMSE of "Ref [10]" is reduced by the "Proposed". Thus, the proposed reconstruction network possesses its robustness for improving the NMSE performance against the impact of c.
To sum up, according to Figs 8-10, the downlink CSI's NMSE performance of "Ref [10]" is effectively improved by the proposed reconstruction network, and these improvements are robust against the impacts of ρ and c.

Online running time
To illustrate the low processing delay of "Proposed", i.e., detection network and reconstruction network, the online running time between "Proposed" and "Ref [10]" is compared in Fig 11, where P = 512, ρ = 0.10, and different values of c (i.e., c = 2.0, c = 2.5, and c = 3.0) are considered. Especially, "Ref [10]" adopts β = 10 and 100 in the reconstruction algorithm (i.e., SCA--BIHT algorithm). Here, β = 10 in "Ref [10]" is used to guarantee the NMSE of the "Proposed" is smaller than that of "Ref [10]", and β = 100 in "Ref [10]" is used to present the "Proposed" has a similar NMSE (in a relatively high SNR region) while significantly lower processing delay as that of "Ref [10]". For a fair comparison, 10 5 online-running experiments are conducted for "Proposed" and "Ref [10]" on the same PC (with CPU i5-8250U) by using MATLAB software. For each given c in Fig 11, the online running time of "Proposed" is shorter than that of "Ref [10]", e.g., when c = 2.0, the online running time of "Proposed" and β = 10 (β = 100) in "Ref [10]" are 75.1s and 201.8s (1266.9s), respectively. This reflects that the proposed 1-bit CS-based superimposed CSI feedback can reduce the processing delay. It is also noticed that, as c rises from 2.0 to 3.0, the online running time of both "Proposed" and "Ref [10]" go up. However, the total increased running time of the "Proposed" is 15.9s, which is far less than that of "Ref [10]" (e.g. 54.7s for β = 10 and 374.0s for β = 100). In addition, Fig 11 shows that the online running time of "Ref [10]" is proportional to the number of iteration. Thus, the NMSE performance might not be applicable for "Ref [10]" with large iteration number, while the "Proposed" can avoid this annoyance. As a whole, compared with "Ref [10]", the proposed DL-based 1-bit superimposed CSI feedback significantly reduces the online running time.

Conclusion
The 1-bit CS-based superimposed CSI feedback is still facing many challenges, such as low recovery accuracy of the UL-US and downlink CSI, and long processing delay, etc. To remedy these defects, the DL-based 1-bit superimposed CSI feedback has been investigated in this paper. The constructed detection network captures optimized network parameters by using joint training, and thus improves the BER performance of the UL-US. Moreover, the detection network is also helpful for reconstructing the downlink CSI. With the detected downlink CSI's bits from the detection network, the proposed reconstruction network utilizes the simplified version of SCA-BIHT with a single hidden layer network, and achieves a significant improvement on NMSE performance of the downlink CSI recovery. In particular, compared with the conventional 1-bit CS-based superimposed CSI feedback, the proposed CSI feedback scheme presents its robustness against parameter variations and possesses significantly low processing delay.